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Abstract 

Communication of quantized information is frequently followed by a computation. We consider situations of distributed 
functional quantization: distributed quantization of (possibly correlated) sources followed by centralized computation of a function. 
Under smoothness conditions on the sources and function, asymptotically-optimal regular scalar quantizer designs are developed 
to minimize distortion of the computed function. Striking improvements over quantizers designed without consideration of the 
function are possible and are larger in the entropy-constrained setting than in the fixed-rate setting. As extensions to the basic 
analysis, we characterize a large class of functions for which regular quantization suffices, consider certain functions for which 
asymptotic optimality is achieved without arbitrarily fine quantization, and allow limited collaboration between source encoders. 
In the entropy-constrained setting, a single bit communicated between encoders can have an arbitrarily-large effect on functional 
distortion. In contrast, such communication has very little effect in the fixed-rate setting. 
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Index Terms 



Asymptotic quantization theory, distributed source coding, non-difference distortion measures, optimal point density function, 
rate-distortion theory 

I. Introduction 

^ ' ONSIDER a collection of spatially-separated sensors, each measuring a scalar Xj, j = 1, 2, . . . , n. As shown in Fig. Q] 

O , the measurements are encoded and communicated over rate-limited links to a sink node without any interaction between 

■ the sensors. The sink node computes an estimate of the function g(X[ l ) = g(X±, X%, . . . , X n ) from the received data. Coding 
, that exploits statistical dependence among the XjS is commonly called distributed source coding and has been the subject of 

J> ■ much research. A complementary concept is to exploit the form of the function g in designing the (separate) coding of each 
C**"* , measurement. Restricting to scalar quantization, this distributed functional scalar quantization (DFSQ) problem is the central 
' ' ■ subject of this paper. Optimal DFSQ can provide performance improvements in addition to any that are rooted in statistical 
, dependence of the Xjs; thus for clarity, most examples presented here are for cases with independent XjS. 

The term functional source coding (FSC) can be reasonably applied any time the information sink uses an approximation 
, to the evaluated function g(X™) instead of the source variables X™ directly. For emphasis, we will refer to approximate 

■ representation of X{ 1 as ordinary source coding. FSC is a trivial problem when the encoding is centralized; in that case the 
\ encoder mapping can be the composition of the function g and a good encoder for the random variable g(X"). With the 

L" constraint of distributed encoding, no single encoder can compute the function, and the situation is thus more intricate. 
. ! The primary aim of this paper is to develop a high-resolution approach to optimal DFSQ. Here as in ordinary source coding, 

■ the high-resolution approach yields optimality among regular quantizers. In ordinary source coding, this is an insignificant 
^ \ limitation because, quite generally, optimal quantizers are regular. For DFSQ, some restrictions on g are needed to ensure that 

the optimal quantizers are regular. This provides another key contrast to previous work. Using only the graph coloring approach 
to FSC of Doshi et al. 0] provides no improvement under these restrictions on g, so the present work is a complement to ifTTl . 
Combining the two approaches is discussed in Section [vTll 



A. Basic Problem Statement 

An information sink wishes to obtain an estimate of g(Xi) where g : K n — > K satisfies some smoothness conditions and 
the random variables (denoted more compactly X[ l ) have some known joint distribution. The estimate is 

computed from scalar-quantized values 

X j = Q j {X j ), j = l, 2, ...,n, 

where Qj applied to Xj has rate Rj. In the fixed-rate (codebook-constrained) setting, this means Qj has Kj = 2 Rj levels; in 
the variable-rate (entropy-constrained) setting, this means H(Qj(Xj)) — Rj where H(-) denotes the entropy. 
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Fig. 1. Distributed functional source coding. 



The accuracy of the approximation is measured by the mean-squared error (MSE) 



D = E 



g(X?)-g(X?) 



For a given set of rates or a maximum sum rate, we seek designs of the quantizers such that distortion D 

is minimized. The problem is approached under the standard assumptions for high-resolution analysis 0, and restrictions on 
g are applied as needed. 



B. Ramifications and Extensions 

Unsurprisingly, there are situations in which designing quantizers to minimize D is no different than designing them for 
low MSEs E[(X, — Xj) 2 ], j — 1. 2, . . . , n. Our analysis will show, for example, that there is no advantage from accounting 
for g when g is linear. However, there are also cases in which the improvement is very large for large values of n; examples 
in Section [V] display distortion improvement over ordinary source coding by a factor that is polynomial in n in the fixed-rate 
case and exponential in n in the variable-rate case. 

In addition to the basic formulation, we consider extensions that re-examine requirements on g and on the lack of commu- 
nication among encoders. First, we define a requirement termed equivalence-free that is less restrictive than monotonicity but 
still guarantees that optimal quantizers are regular at sufficiently high rate. This leads also to some consideration of non-regular 
quantizers. Next, we explore a situation in which the high-resolution analysis breaks down because there is an interval where 
the marginal density fx is positive but the optimal quantizer for Xj seems to not have fine partitions. This prompts the concept 
of a don't care interval, a mixture of low- and high-resolution, and connections with |1|. Finally, we allow rate-constrained 
information Y^i communicated from encoder 2 to encoder 1 to affect the encoding of X\. We call this chatting and bound 
its effect on the distortion D. In the fixed-rate setting, the reduction in distortion from Y2— >i can be no more than if Ri were 
increased by the same rate; in the variable-rate setting, the reduction in distortion can be arbitrarily large. 



C. Structure of Paper 

We start in Section[n]by discussing several topics with connections to functional quantization. Additionally, we briefly review 
the high-resolution approximation techniques used in our analysis. In Section [ill] we obtain optimal fixed- and variable-rate 
functional quantizers for the n = 1 case; while not important in practice, this case illustrates the role of monotonicity and 
smoothness of g(-). Generalizations to arbitrary n, under monotonicity restrictions on g( ), are given in Section [IV] Some 
notable examples in Section [V] are those that show dramatic scaling of distortion with respect to n. 

The second half of the paper extends the basic theory of Section [TV] Section [Vl] addresses the monotonicity restriction and 
shows that a weaker equivalence-free condition is sufficient for the optimality of the constructions of Section [IV] to hold. In 
the process we develop the notion of high-resolution non-regular quantization. In Section IVIII we consider certain conditions 
that cause the high-resolution approach to lead to an optimal quantizer for Xj that does not have high resolution over the 
entire support of fx ■ A modified analysis and design procedure yields a "rate amplification" in the variable-rate case. Limited 
communication between encoders, or chatting, is studied in Section IVIII1 concluding comments appear in Section [TX] 



II. Background 

A. Related Work 

DFSQ lies at the intersection of several problems including quantization, distributed source coding, and non-MSE distortion 
measures. As such, there are many connections to related work. We provide a brief summary of some of these connections 
here. 

Consider the situation depicted in Fig. Q] with n = 2. In general, Xi and X 2 are random variables with some joint distribution, 
and g is a function of the two. We arrive at several related topics by considering special cases of this formulation. 
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• If g is the identity function, we have a general distributed source coding problem that is well-known in the lossless 
setting [3]. The lossy setting with scalar coding is considered in [4], and the lossy problem with jointly Gaussian sources 
and MSE distortion was recently solved in Q. In this situation, the correlation of X\ and Xi is of primary interest. 

• In the lossless setting, Han and Kobayashi [6| studied the classification of functions according to whether the rate region 
is the same as that for the identity function (i.e., the same as the Slepian-Wolf rate region). Their results are conclusive 
when n = 2 and the source alphabets are finite. 

• If g(Xi,X2) = X\ and R<z is unconstrained, then X2 can be viewed as receiver side information available at the 
decoder. The trade-off between R\ and distortion (of X\ alone) is given by the Wyner-Ziv rate-distortion function Q, 
0. Rebollo-Monedero et al. examined the Wyner-Ziv scenario at high resolution and showed that providing the receiver 
side information to the encoder yields no improvement in performance j9)- 

• For general g and R2 unconstrained, the problem has been studied by Feng et al. iflOll . who provide an assortment of 
rate-loss bounds on performance. 

• Under suitable constraints on the distortion metric, one may also view X2 as receiver side information that determines 
the distortion measure on X\, drawing a connection to fiTI and fl2l . 

• For discrete X\ and X2, the lossless regime has been explored for R2 unconstrained by Orlitsky and Roche [13]. The 
distributed version of this problem, which involves minimizing the sum-rate Ri + R2, was later explored by Doshi et 

al. m. 

« Let Y = g(Xi, X 2 ). Then Y may be interpreted as a remote source that is observed only through X\ and X 2 , and we 
have the remote source multi terminal source coding problem |fT31l . 

Interesting related problems have also arisen without a requirement of distributed coding. Rather than having a single function 
g, one may consider a set of functions {g a }aeA an d define 



where a is a random variable taking values in index set A. One may consider this a special case of the Wyner-Ziv problem 
with a as decoder side information and a functional distortion measure. In such a setting, fixed- and variable-rate quantization 
to minimize MSE was studied by Bucklew |16|. Note that if the function were known deterministically to the encoder, one 
could do no better than to simply compute the function and encode the result. 

Under appropriate constraints on the function g, one may consider it as having introduced a locally quadratic distortion 
measure on the source X™. In flTI , Linder et al. consider quantization via companding functions for locally quadratic distortion 
measures. We say more about connections to this work in Section HV-EI 

Additionally, quantization with a functional motive bears resemblance to the idea of "task-oriented quantization." There has 
been considerable work in this direction for detection |[T8l . fl9l , classification l20l . and estimation 1211 : see also the review 
article [22|. The use of a function at the decoder can be seen as inducing a non-MSE distortion measure on the source data. 
In this sense, a thread may be drawn to perceptual source coding l23l . where a non-MSE distortion reflects human sensitivity 
to audio or video. 

B. High-Resolution Approach to Quantizer Design 

We first provide an informal summary of the assumptions and approximations that are standard for high-resolution analyses. 
Then, optimization of quantizers under these high-resolution approximations are summarized. More technical details and 
references to original sources may be found in |0 . 

1) Assumptions and Basic Approximations: Let X be a random variable with probability density function (pdf) fx(x). 
Suppose a quantizer for X has points {{3i}i£j and partition {Si}i^x- For optimality, it is necessary for each set in the partition 
to be an interval, i.e., the quantizer is regular [24, Sect. 6.2]. 

The distortion of the quantizer is 



by the law of total expectation. The initial aim of high-resolution theory is to express this distortion as an integral involving 
fx- To that end, we make the following assumptions about the source and quantizer: 

HR1. fx is smooth enough that it may be approximated as constant on each Si. While it is convenient to think of fx as 

continuous, it suffices for it to be measurable (2] Sect. IV- A]. 
HR2. fx has bounded support or decays sufficiently fast. Sufficient decay is for terms in (JTJ corresponding to unbounded S^s 

to make negligible contributions. 
HR3. Neighboring cells have approximately equal sizes, except possibly for two semi-infinite boundary cells. 



D g =E d{g a {XF),g a (X») 




(1) 
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When the number of points K = \T\ is large, Assumption HR3 allows one to define a (normalized) point density function 
X(x) such that SA(x) is approximately the fraction of quantizer points in an interval of length S centered at x. The point 
density is used to express the lengths of the partition cells: 

x e Si => length(Si) » (jrA(x)) _1 . (2) 

Now we can approximate each non-boundary term in (HJ. By Assumption HR1, (3i should be approximately at the center 
of Si, and the length of Si then makes the conditional expectation approximately j^(KX((3i))~ 2 . Invoking Assumption HR1 
again, the ith term in the sum is J xeS . j 2 (KX(Pi))~ 2 fx(x) dx. Finally, neglecting overload distortion because of Assumption 
HR2, 

Dx * / ^^f^ fxWdx = [A- 2 (X)] . (3) 

This approximation holds in the sense that the ratio of the two quantities approaches 1 as the rate increases. This is the meaning 
of for the remainder of the paper, except where noted. 

In general, the optimal variable-rate quantizer may have an infinite number of points, and this is handled with an unnormalized 
point density. For convenience, we consider sources with support bounded to [0, 1] to obviate this. Then as long as the 
quantization is fine (A(a;) > 0) wherever the density is positive, we can approximate the output entropy of the quantizer using 
the point density as follows: 



H(X) = -J2 P ( X £Si)log 2 P(X&Si 

(a) 



iei 

J fx{x) \og 2 p{x) dx 

« - J fx(x)log 2 (f x (x)/(KX(x)))dx 

= -J fx{x)\og 2 fx{x)dx 

+ J fx(x)log 2 (KX(x))dx 

= h(X) + log 2 K + E[log 2 X(X)}, (4) 

where p(x) is defined as P (X £ Si) for x S Si and h(X) is the differential entropy of X. Step (a) uses HR1; and step (b) 
uses HR1 and (f2|l. 

2) Optimal Point Densities: Once quantizer performance has been expressed in terms of point densities, optimal designs 
can be found easily. We derive the optimizing point densities and the resulting distortions because analogous optimizations 
appear in Sections [HI] and [IV] 

In the fixed-rate case, the problem is to minimize Dx for a given value of K. (The rate is R = log 2 K.) An application of 
Holder's inequality yields 

J fx /3 (x)dx = j{^j'\mf"dx 

= (E[A- 2 (X)]) 1/3 , 
with equality when fx(x)/X 2 (x) is proportional to X(x). Thus, Dx is minimized by 

X(x) = f 1 x /3 (x)/(ff 1 x /3 (t)dt). (5) 

The resulting minimal distortion is 



Dx « ^5 (/ /i /3 (x)dx) 3 = i||/ x || 1/3 2- 2 «, 



(6) 



where we have introduced a notation for the C 1 ^ 3 pseudonorm. 

For the variable-rate scenario, the problem is to minimize Dx for a given maximum value of H(X). Starting with a 
rearrangement of and using Jensen's inequality (with the convexity of — log 2 (-)), 

2(H(X)-h(X)) » E[-\og 2 (K- 2 X- 2 (X))} 
> -log 2 E[^- 2 A- 2 (X)] 
w -\og 2 {UD x ). 
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This translates to an approximate lower bound on Dx, and the inequality step holds with equality when X(X) is a constant. 
Thus A(.r) = 1 is asymptotically optimal, i.e., the quantizer should be uniform^ The corresponding minimal distortion is 

Dx « ]_ 2 2h(X) 2 ~2R^ (?) 

12 ' 

Note that both optimal point densities are positive on the entire support of fx- Thus, at high enough resolution, the 
quantization is fine pointwise over X. In the functional settings, this will be used to justify piecewise linear approximation of 
the function g. Note also that both variable- and fixed-rate quantization have ~ 2~ 2R , or —6 dB/bit, dependence of distortion 
on rate. This is a common feature of ordinary quantizers, but we demonstrate in Section \VU\ that certain functional scenarios 
can cause distortion to fall even faster with the rate. 

One way to concretely specify a quantizer from a point density is to require 

ACS £ ) = i-5» i = l,2,...,K, 

where A(x) — X(t) dt is the "cumulative" point density. However, analysis of quantizers through point densities does not 
rely on the precise placement of codewords and cell boundaries. Under the assumptions of high-resolution analysis, o(l/K) 
deviations in the /3,s do not affect the distortion. We return to this point in Section IIII-EI to partially generalize the basic 
analysis to discontinuous functions. 

3) Optimal Bit Allocation: As a final preparatory digression, we state the solution to a typical bit allocation problem that 
arises several times in Section |PVl 

Lemma 1: Suppose D = 2~2^=i Cj2~ 2Rj for some positive constants {cj}™ =1 . Then the minimum of D over the choice of 
{RjYj=i subject to the constraint Yjj=i — n ^ * s attained with 

1 



Rj=R+^og 2 - ^-TAT' i = l, 2, ...,n, 



(117=1' 



resulting in 



D = n(n; =1 c J ) 1/n 2-™, 



Proof: The result can be shown using the method of Lagrange multipliers. It appeared first in the context of bit allocation 
in ||25l ; a full proof appears in Il24l Sect. 8.3]. ■ 
The lemma does not restrict the RjS to be nonnegative or to be integers. Such restrictions are discussed in |26|. 



III. Univariate Functional Quantization 

Let X be a random variable with pdf fx{x) defined over [0,1], and let g : [0, 1] — > R be the function of interest. The source 
X is quantized at rate R into X = Q(X), and an estimate g(X) is formed at the decoder, where g is the estimator function. 
We wish to design g and Q(X) to minimize the functional distortion, D = E[(g(X) — g(X)) 2 ]. 

Since we seek to answer this design question with high-resolution techniques, the function g and the source X must be 
restricted in a manner similar to Section III-BI For the moment we err on the side of being too strict. Sections [VT] and IVHI will 
significantly loosen these requirements. 

We have already assumed fx has bounded support. Additionally, we require high-resolution assumptions HR1 and HR3 
from Section III-BI and the following conditions for the univariate function: 
UF1. g is mono tonic. 

UF2. g is continuous on [0, 1]; and g' and g" exist and are uniformly bounded, except possibly at a finite number of points. 



A. Sufficiency ofg = g 

Throughout this paper, we assume that g = g. In the univariate case, the assumed continuity of g ensures this is without 
loss of generality. 

Lemma 2: Consider a functional quantization problem with source X and function of interest g that is continuous on the 
closure of the support of X. Given any quantizer and estimator pair (Q,g), there exists a quantizer Q with the same rate as 
Q such that the pair (Q,g) has distortion at most equal to that of 

Proof: We will prove the lemma by picking a pair (Q,g) with g = g that minimizes the functional MSE. Decompose Q 
as Q = (3q o ctQ where aq : R — > I and (3q : I — > R, and decompose Q similarly. Picking — «q makes the rates of Q 
and Q equal. It remains to pick [3q that when paired with g minimizes functional distortion. 

'Recall that for the variable-rate case we are assuming fx is supported on [0, 1]. For other bounded supports, the optimal point density would still be a 
constant, but perhaps different from 1 . Unbounded supports require the use of an unnormalized point density. 
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For the functional MSE to be minimized, it is necessary for the quantization decoder /3g and estimator g to satisfy 

q(Pq (t)) = E [g(X) | X g Si] for every i g X. 
Even with no restriction on g and Si we must have 

mhig(x) < E | X g Sj] < max#(a;). 

Now since g is continuous, the intermediate value theorem implies the existence of a value (3q(i) such that <?(/3g(«)) = 
E [g(X) | X g Sj]. This specifies the desired quantization decoder j3g. ■ 
Note that the proof of Lemma[2]did not require g(Si) to be an interval, even though optimal quantization of g(X) involves 
partitioning into intervals. In the natural case that each g(Si) is an interval, one can require /3q(«) £ S,; for every i, as one 
would expect from a quantizer. 



B. Sufficiency of Regular Quantizers 

The following lemma relates monotonicity to regularity of optimal quantizers, thus justifying the introduction of Assumption 
UF1: 

Lemma 3: If g is monotonic, there exists an optimal functional quantizer of X that is regular. 

Proof: The optimal functional quantizer in one dimension is induced by the optimal ordinary quantizer for the variable 
Y = g(X). That is, one may compute the function g(X) and quantize it directly. Since the optimal ordinary quantizer for a 
real-valued source is regular, the optimal quantizer over Y, denoted by Qy(u) and having points {yi}i<=x, is regular. 

Qy(y) may be implemented by a quantizer for X with cells given by g~ 1 (Q Y 1 (Vi))- We know that Q^ilJi) is an interval 
since Qy is regular. Also, since g is monotonic, the inverse map g _1 applied to any interval in the range of g gives an interval. 
Thus g^ 1 (Qy 1 (Vi)) is an interval, which demonstrates that a regular quantizer in X will be optimal. ■ 



C. High-Resolution Distortion 

Assumption UF2 is introduced so that a piecewise linear approximation of g suffices in estimating the functional distortion 
of the quantizer. Recalling the notation {/3i};ez for the quantizer points and {Si}i e x for the partition, we will show that 

gpL(x) = g({3i) + g'((3i)(x - fa), for x g Si, iel 

is an adequate approximation of g for our purposes. Excluding partition cells in which g"(x) does not exist, for any x g Si, 



\g(x) - g PL (x)\ < \ (jnax| 5 "(0|j (length^)) 2 



(8) 



by Taylor's theorem. Then, invoking Assumption UF2 and the fact that length(Si) vanishes for all S;S that intersect the support 
of fx, we see that <7pl is accurate in a precise sense. 

The use of <?pl prompts us to give a name to the magnitude of the derivative of g. The distortion is then expressed using 
this function. 

Definition 1: The single-variate functional sensitivity profile of g is defined as "f(x) = \g'(x)\. 

Theorem 4: Suppose a source X g [0, 1] is quantized with a K-level quantizer with point density A(a;). Further suppose 
that the source, quantizer, and function g : [0, 1] — > R satisfy Assumptions HR1-3 and UF1-2. Then 

1 



D = E 



(g(X)-g(X)Y 



12K 2 



E 



( 7 (x)A(x)r 



(9) 



Proof: See Appendix lAl 



D. Optimal Point Densities 

The distortion expression (O bears strong resemblance to (01, but with the probability density fx (x) replaced with a surrogate 
density r y 2 (x)fx(x). Optimal point densities and the resulting distortions now follow easily. 

For fixed-rate coding, we are attempting to minimize the distortion (O for a given value of K. Following the arguments in 
Section III-B2I the optimal point density is proportional to the cube root of the surrogate density: 

_ w/xwr 
f(-t 2 <t)M.m l " & 

The (asymptotic) optimality of this point density relies on the quantization being fine everywhere fx is positive. Thus, we 
must exclude the possibility that j(x) — for an interval x g (a. b) such that P (X E (a, b)) > because in this case the 
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Fig. 2. Quantizer points illustrating the point densities derived in Example \T\ at rate Ft = 4. 
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quantization is not fine for X € (a, b). We revisit this restriction in Section [VTI1 By evaluating (0 with point density ( fTOt , the 
resulting distortion is 

^^l|7 2 / X || 1/3 2- 2fl (11) 

For variable-rate coding, we are attempting to minimize the distortion subject to an upper bound on the rate given by (0]). 
By a derivation similar to that of ordinary variable-rate quantization, the optimal point density is found to be proportional to 
the functional sensitivity profile: 

K x ) = -tM^. (12) 
J 70) dt 

The restriction to make the quantization fine everywhere fx is positive takes the same form as above. The high-resolution rate 
approximation © is then valid, and the resulting distortion is 

£>~ —W^Wl 2 2/l W+ 2E [ lo S2 7P0] 2~ 2R . (13) 

Example 1: Suppose X is uniformly distributed over [0, 1] and g{x) — x 2 . For both fixed- and variable-rate, the optimal 
ordinary quantizer is uniform, i.e., A or d = 1- With 7(2;) = 2x, evaluating (0 gives Z? or( j ~ 7>^ 2R ~ 0.111 ■ 2~ 2R . 
The optimal point density for fixed-rate functional quantization is \f T (x) = |ir 2 / 3 and yields distortion 

Dfi „ l\U 2 x) 2 \\ 1/3 • 2- 2R = —2- 2R « 0.072 • 2- 2R . 

12 IIV ) 111/3 125 

The optimal point density for variable-rate functional quantization is A vr (x) = 2x. With H7H1 = 1, h(X) = 0, and 
E [log 2 7(X)] = 1 — 1 / (In 2), the resulting distortion is 

D VI w — • Ae- 2 ■ 2- 2R w 0.045 • 2~ 2R . 

Quantizers designed with the three derived optimal point densities are illustrated in Fig. [2] for rate R = 4. The functionally- 
optimized quantizers put more points at higher values of x, where the function varies more quickly. In addition, the variable-rate 
quantizer is allowed more points (K = 21) while meeting the rate constraint. 

The interested reader can verify that D& and D VY exactly match the performance obtained by designing optimal quantizers 
for V A 2 . " " □ 

The example shows that even for univariate functions, there are benefits from functional quantization. While quantizing X 
instead of g(X) seems naive, as we move to the distributed multivariate case it will not be possible to compute the function 
before quantization. The approach of linearly approximating g will generalize to allow optimization of quantizers. 



E. Discontinuous Functions 

Our main result on univariate functional quantization, Theorem [4] assumes the continuity of g. One can effectively sidestep 
this assumption, but doing so requires the quantizer to be described more precisely than by a point density function alone. 

For simplicity, assume fx is strictly positive on [0, 1]. Suppose we were to allow g to have a point of discontinuity xq 6 (0, 1) 
with 

c = lim \g(x + 8) - g(x - S)\ > 0. 
5 — >o 

The difficulty that arises is that if xq is an interior point of a partition cell Si, this cell produces a component of the functional 
distortion proportional to c 2 P (X E Si). Since c 2 P(A" £ Si) — 9(i4T~ 1 ), it is not negligible in comparison to the (best case) 
<d(K~ 2 ) functional distortion. Thus having a point of discontinuity of g in the interior of a partition cell disrupts the asymptotic 
distortion calculation (|9]). 

The representation of quantizers by number of levels K and point density function A does not allow us to prevent a point 
of discontinuity from falling in the interior of a partition cell. However, if we augment the description of the quantizer with 
specified partition boundaries, we can still obtain the distortion estimate (0. 



Corollary 5: Suppose a if -level quantizer for a source X G [0, 1] is described by point density function X(x). Further 
suppose that the source, quantizer, and function g : [0, 1] — > M satisfy Assumptions HR1-3 and UF1-2 with the exception 
of discontinuities at M points {x m }^ =1 . Then a (K + Af)-level quantizer obtained by adding partition cell boundaries at 
{ x m}m=i "will have distortion 

1 



D = E 



(g(X)-g(X)Y 



12K 2 



E 



( 7 (x)A(x)r 



Proof: The proof is omitted, as it requires only minor modifications of the proof of Theorem 0] in Appendix [A] ■ 
In the sequel, we will not consider discontinuous functions. It seems that a multivariate extension of Corollary [5] would 
require points of discontinuity to be in the Cartesian product of finite sets of discontinuity for each variable. Such separable 
sets of points of discontinuity are not general and can be handled rather intuitively. 



IV. Multivariate Functional Quantization 
With Section [Til] as a warm-up, we can now address the actual distributed functional scalar quantization problem. Let X\ L be 
a random vector with joint pdf fx?{%i) defined over [0, 1]™, and let g : [0, 1]" — > R be the function of interest. As depicted in 
Fig-Q] each source Xj is quantized at rate Rj into Xj = Qj(Xj), separately, and an estimate <?(X") is formed at the decoder, 
where g is the estimator function. We wish to design g and Qj(Xj), j = 1, 2, . . . , n, to minimize the functional distortion, 
D=E[(g(X?)-g(X?)f). 



A. Assumptions 

As in Section Hill we will impose restrictions on the function g and the joint distribution of Xf so that a local affine 
approximation is effective. For j G {1, 2, . . . , n}, let }{ezW denote the quantization points and {Sj Jjgjtfl the partition 
cells of the quantizer Qj. We require each partition to satisfy Assumption HR3. As a multivariate counterpart to Assumption 
HR1, we require: 

HR1'. fx™ is smooth enough that it may be approximated as constant on each cell x S^' x ■ ■ • x in the rectangular 
partition induced by all n quantizers together0 
To simplify the proof of the main result, we make a slightly stronger smoothness assumption on multivariate function g than 
in the previous section: 
MF1. g is monotonic in each variable. 
MF2. g is twice continuously differentiable on [0, l] n . 

With the monotonicity requirement, Lemma [3] applies separately to each quantizer to show that designing regular quantizers 
does not preclude optimality. We will analyze the case of g = g and then formally justify this by showing that the difference 
in distortions between using g = g and the optimal g is asymptotically negligible (Theorem |7). 



B. High-Resolution Distortion 

Our main technical task in finding the optimal quantizers is to justify an approximation of the distortion in terms of point 
density functions. Since the quantization is distributed, our concept of functional sensitivity is now extended to each variable 
separately, with averaging over all the remaining variables. 

Definition 2: The jth functional sensitivity profile of g is defined as 

lj (x)=(E[\g j (X?)\ 2 \X j =x\y /2 

where gj(x") denotes dg(x")/dxj. 

Theorem 6: Suppose n sources X{ L G [0,1]™ are quantized in a distributed manner with a Kj-le\el quantizer with point 
density Xj applied to Xj. Further suppose that the source, quantizers, and function g : [0, 1]™ — » R satisfy the assumptions of 
Section HV-Al Then 



D =E 



(g(X?)-g(X?)f 



1 



E 



7j(*j) 



(14) 



Proof: See Appendix 151 

Theorem 7: Assume the conditions of Theorem [6] and denote the performance of the optimal estimator by 



D, 



opt 



E 



(s(*n 



E 



Then D w D opt . 

Proof: See Appendix ICl 



2 See 1 27} for a discussion of this local uniformity condition. 
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C. Optimal Point Densities 

Theorem [6] uses the functional sensitivity profiles to decouple our design problem into n separate problems of designing a 
single point density Xj . Furthermore, each design problem (the minimization of a term of (TT4l >) is of a familiar form. Thus we 
obtain the following theorem. 

Theorem 8: For fixed {Kj}" =1 — corresponding to fixed-rate quantizers at specified rates — the distortion expression (TBI i is 
minimized by the choice 



X i(*) = ,,„,.;. J 1/3 .. . 3 = 1. 2 . . . , n, (15) 



resulting in distortion 

n 1 1 n 

J^hjfx* lli/3 = 7^ £ Hi/ 3 2- 2 ^, (16) 

/ I J 3=1 

where Rj = log 2 is the rate of Qj. If Yjj=i Rj ' s fixed to nR with i? large enough and no requirement that {2 flj }™ =1 be 
integers, the minimum distortion 

^-^fllH/xJIvs] (17) 



is achieved with A,s given by (fl~5T > and 



y 2 ) 

fl,- = iZ+^log a — ",, J „7";;" <1/n , i = l,2...,n. (18) 



- log 2 — 

2 " (nLi ii7fc/xjii/ 3/ 



For fixed entropies {H(Xj)}^ =1 given by — corresponding to variable-rate quantizers at specified resolutions — the 
distortion (fT~4T > is minimized by the choice 



r^3I . i = l,2 ...,n. (19) 



As long as each Xj is positive wherever fx, is positive, the high-resolution rate analysis is valid, and the resulting distortion 
can be written as 

n 

£) ~ _ || 7j ||2 2 2ft(X 3 )+2E[log 27j (X 3 )] (20) 

where i?j = H(Xj) is the output entropy of Qj. If Y^=i Rj i s fixed to nR and i? is large enough, the minimum distortion 

( " \ V " 

15 ~ ~M J| 117^11 2 2 2h(X3)+2E[log27j(X3)1 2~ 2i? (21) 

is achieved with AjS given by ( fT9] > and 

1 || 7 .||2 2 2 ^(^j)+2E[log 273 (X : ,)] 
- log, 

2 (rifc=i llTfclli 2 2/l ( x fc)+ 2E P°g2 7 fc (^ fc )]) 



^=i?+-log 2 — liliil— — — - — (22) 



for j = 1, 2 . . . , n. 

Proof: To prove (JT3J, (fTST l, ( fT9] >, and (l20l >. it suffices to note that minimizing the n terms of ( TT4l > separately gives problems 
identical to those in Section Hill 

Minimizing (fT6b through the choice of rates summing to ni? is precisely addressed by Lemma [T] this yields (TTTb — <TT~8b. Bit 
allocation d22l and resulting distortion (|2TT > similarly follow easily from (EOt . ■ 

Z3. Variable-Rate with Slepian-Wolf Coding 

Distortion expressions (fTTI i and (f2Tb are minimum distortions subject to a sum-rate constraint. The individual rates given by 
Rj = log 2 Kj or by (|4]i implicitly specify no entropy coding or separate entropy coding of the XjS, respectively. 

If the Xj& are not independent — which is anticipated whenever the XjS are not independent — one may employ Slepian- 
Wolf coding of the Xjs without violating the distributed coding requirement implicit in Fig. Q] This lowers the total rate from 
H{Xj) to H(Xi, X 2 , ■ ■ ■ ,X n ). In this section we study how the inclusion of Slepian-Wolf coding affects the optimal 
quantizers and the resulting performance. 
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Following the development of (|4]i line-by-line gives a high-resolution joint entropy estimate 

n n 

H{X?) » h{X?) + J2 l0 S2 Kj + E [ l0 S2 A, (J^)] , (23) 
j=i i=i 

where is the joint differential entropy of X™. The distortion expression ( TT~4T > does not depend on the presence or absence 

of Slepian-Wolf coding. 

Theorem 9: The minimum of the distortion (TBI over the choice of point densities {Aj}" =1 and resolutions {Kj} r j =1 subject 
to upper bound nR on joint entropy ( |23l is 

/ n \ 1/n 

D^^i 2 2h{x ^ 11 || 7j ||2 2 2E[iog 2 7 3 (^)] 2 - 2R . (24) 

It may be attained by the point densities (T% and {Kj]^_-y satisfying 

Rj = h(X j | X{- 1 ) + log 2 Kj + E [log 2 \jiXj)] , (25) 
where p , „ 

1 || ,.||2 22E[log 2 7j(X j )] 

Rj=R+- log 2 . (26) 

2 (nLlll7 i b||?22EPo g2 7,(^)]) 1 /» 



Proof: First suppose that the KjS are fixed. For clarity define 

2" 



L>, = — UtE 
3 12K] 



XjiXj) 



3 = 1,2 ...,n. 



Then since nR = Y^j=i an< ^ D — Y^j=i Dj, we have n decoupled optimizations: for j = 1, 2, . . . , n, minimize Dj 
subject to an upper bound on Rj . The solution for any j is to use the point density ( fT9l ), resulting in distortion component 

n. ps J_||^.||2 2 2'»(^|Xr 1 )+2E[log 273 -(X j )] 2 -2R 3 
3 12 M /Jill 

Now minimizing X^=i subject to a constraint on 5Zj=i i s addressed by Lemma [1] It yields ( 124b and j26l ) when one 
notes that the product YHj=i 2 2h ( Xj \ x l ' that appears in the product YHj=i Dj is 2 2/l ^i ), by the chain rule of differential 
entropy. ■ 

Some remarks: 

1) By comparing d24b to d2Tb . we see that the inclusion of Slepian-Wolf coding has reduced the sum rate to achieve any 
given distortion by 

This is, of course, not unexpected as it represents the excess information in the product of marginal distributions as 
compared to the joint distribution. This has been termed the multiinformation ll28l and equals the mutual information 
when n = 2. 

2) The use of h(Xj | X^ 1 ) in (|25T > is somewhat arbitrary. It can be replaced by any achievable point on the Slepian-Wolf 
joint-entropy boundary. The optimal Rj& and resulting distortion D would not be affected, but the KjS would change. 
One can interpret this as a flexibility in resolution allocation (slightly distinct from bit allocation) that can be used to 
control inaccuracies due to the high-resolution approximations. 

3) The theorem seems to analytically separate correlations among sources from functional considerations, exploiting cor- 
relation even though the quantizers are regular. In reality, the binning introduced by Slepian-Wolf coding makes the 
quantizers effectively nonregular to remove redundancy between sources. 



E. Relationship to Locally-Quadratic Distortion Measures 

In JT7 1, the authors consider the class of "locally-quadratic" distortion measures for variable-rate high-resolution quantization. 
They define locally-quadratic measures as those having the following two properties: 

1) Let x be in W 1 . For y sufficiently close to x in the Euclidean metric, the distortion between x and y is well approximated 
by Y^h=i Mi(x)\ x i — Ui\ 2 , where M^{x) is a positive scaling factor. In other words, the distortion is a space-varying 
non-isotropically scaled MSE. 

2) The distortion between two points is zero if and only if the points are identical. 
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For these distortion measures, they consider high-resolution variable-rate regular quantization, generalize Bucklew's results 
|[T6l to non-functional distortion measures, and demonstrate the use of multidimensional companding functions to implement 
these quantizers. Of particular interest is the comparison they perform between joint vector quantization and separable scalar 
quantization. When Slepian-Wolf coding is employed for the latter, the scenario is similar to the developments of this section. 

The source of this similarity is the implicit distortion measure we work with: d g (x, y) — \g(x) — g(y)\ 2 - When x and y are 
very close to each other, Taylor approximation reduces this error to a quadratic form: 

n 

\g(x)-g(y)\ 2 ^Yl 

i=l 

From this, one may obtain the same variable-rate Slepian-Wolf performance as (l24l > via the locally quadratic approach. 

However, there are important differences between locally-quadratic distortion measures and the functional distortion measures 
we consider. First and foremost: a scalar function of n variables, n > L is guaranteed to have an uncountable number of 
pairs x ^ y for which g(x) = g(y) and therefore that d g (x,y) — 00 This violates the second condition of a locally- 
quadratic distortion measure, and the repercussions are felt most strikingly for non-monotonic functions — those for which 
regular quantizers are not necessarily optimal (see Section [Vlb . 

The second condition is also violated by functions that are not strictly monotonic in each variable; one finds that without 
strictness, variable-rate analysis of the centralized encoding problem is invalidated. Specifically, if the derivative vector 

\ dxi ' 8x2 ' dx n J 

has nonzero probability of possessing a zero component, the expected variable-rate distortion as derived by both Bucklew 
and Linder et al. is D = 0, regardless of rate. This nonsensical answer arrives from the null derivative having violated the 
high-resolution approximation. Given that even the several example functions we consider in the following section fall into 
this trap, the raw centralized analysis has limited applicability to functional scenarios. In future work, generalizations of our 
results in Section IVTI1 may be able to address such deficiencies. 



dg(x^) 



dxi 



V. Examples 

Before moving on to extensions of the basic theory, which complicate matters, we present a few examples to show how 
optimal ordinary scalar quantization and optimal DFSQ differ. We especially want to highlight a few simple examples in which 
performance scaling with respect to n differ greatly between ordinary and functionally-optimized quantization. 

Example 2 (Linear function): Consider the function g(x™) = X)J=i a j x j wnere the a j s are scalars. Then for any j, r fj(x) = 
|aj |. Since Jj(x) does not depend on x, it has no influence on the optimal point density for either the fixed- or variable-rate 
case; see (15[ and $1% . 

Although jj (x) gives no information on which values of Xj are more important than others (or rather shows that they are all 
equally important) the set of jjS shows the relative importance of the components. This is reflected in optimal bit allocations 
computed via ( fT~8b or d22i >. □ 

Example 3 (Maximum): Let the set of sources X" be uniformly distributed on [0, 1]" and hence mutually independent. 
Consider the function 

g(x™) = maxfii, x 2 , . ■ ., x n ). 

Though very simple, this function is more interesting than a linear function because the derivative with respect to one variable 
depends sharply on all the others. The function is symmetric in its arguments, so for notational convenience consider only the 
design of the quantizer for X\. 

The partial derivative <7i(x") is 1 where the maximum is x\ and is otherwise. Thus, 

7l 2 (z) = E[|. gi (Xr)| 2 |X 1= x] 

= P (maxpff ) =X l \X 1 =x) 
= x n -\ 

where the final step uses the probability of all n — 1 variables X£ being less than x. 
The optimal point density for fixed-rate quantization is found by evaluating ( [T5| > to be 

Ai(x) = \{n + 2)x {n - 1)/3 . 

3 If not, K" has the same size as Z X K, which is an absurdity. 
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Fig. 4. Distortions of optimal fixed- and variable-rate functional quantizers for maximum and median functions from Examples [5] and [4] Shown is the 
dependence on the number of variables n; by plotting D ■ 12 • 2 2R we see the performance relative to ordinary quantization. 



The resulting distortion when each quantizer has rate R is found by evaluating ( TP71 ) to be 

/ O \ 3 

n „ „ ~~2R _ n I 6 \ o-2-R 



Ar « Tdl7llll/ 3 2 



12 11 ,1II±/J 12 \n 

9n n-2R 



4(n + 2) 3 

The optimal point density for variable-rate quantization is found by evaluating ( fT9] l to be 

Ai(x) = \{n + l)x {n - 1)/2 . 
Substituting || 7 i||i = 2/(n + 1), fr(Xi) = 0, and 2 2E I 1 °g2 71(^1)] = e -n+i into (jJJJ, gives 

^ vr ^12 {n + \y e "3(n + l)2 e 

The two computed distortions decrease sharply with n. This is in stark contrast to the results of ordinary quantization. When 
functional considerations are ignored, one optimally uses a uniform quantizer, resulting in E[(X, — Xj) 2 ] s» j^2 -2 ^ for any 
component. Since the maximum is equal to one of the components, the functional distortion is _D rd ~ j^2^ 2R , unchanging 
with n. 

The optimal point densities computed above are shown in Fig. [3] The distortions are presented along with the results of the 
following example in Fig. [4] □ 

Example 4 (Median): Let n = 2m+ 1, m £ N, and again let the set of sources X" be uniformly distributed on [0, 1]™. The 
function 

g(xi) = median(xi, X2, • ■ ■ , x n ) 
provides a similar but more complicated example. 
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The partial derivative gi(x™) is 1 where the median is x\ and is otherwise. Thus, 

= P (median (X?) = X x \ X 1 = x) 
{2m> 



\ m 



x m (l~x) m , 



where the final step uses the binomial probability for the event of exactly m of the 2m variables exceeding x. 
The optimal point density for fixed-rate quantization is found by evaluating ( fT3T > to be 

x m / 3 (l - x) m ' 3 

Xi(x) = 



B(m/3 + l,m/3+l) 

where B is the beta function. The resulting distortion when each quantizer has rate R is found by evaluating ( fTTI l to be 

2m + 1 



Dir « W\\ 1/3 2- 



-27V 



2m + 1 / 2m\ / / m m \\ 3 9R 
'S - + 1,-+1) 2~ 2jR 



12 \m J \ V 3 '3 

To understand the trend for large m, we can substitute in the Stirling approximations ( 2m ) ~ (m7r)~ 1 / 2 2 2m and 



B(m/3 + 1, m/3 + 1) - ^Jm~ 2 -^ 2m ^ +3 ^ 

to obtain 

D fi ^ ™ 22m f 6tt \ 3/2 2 _ (2m+ 9/2) 2 -2ij = ^V3 2 _ 2fi 
6 \/mn \ m J 16m 



The optimal point density for variable-rate quantization is found by evaluating ( fT9] l to be 

x m {l-x) m 
l{X) ~ B{m + l,m+lY 

To evaluate the resulting distortion, note that 

,, f2m\ / /m m \\ 2 
"^=(m)( B (T + 1 <T + 1 )) ' 
= 0, and 2 2E P & 7i(*i)] = ( 2 ™) e - 2m . Substituting into (ED gives 

„ 2m + 1 / 2m\ 2 / „ / m m \ \ 2 _ 9 R 
^-^2-(mj Wt + 1 'T + 1 )) 6 2 

Using the approximation above for the binomial factor and 



B(m/2 + 1, m/2 + 1) - v^Tr/m 2~ (m+3/2) , 

we obtain 

^ /2. I CT o-(2m+3) c -2m 2~ 2fl 

6 m7r m 

-2m 

-2i?, 



J_,ey 
12m \2J 



2' 



The optimal point densities computed above are shown in Fig. The distortions are presented along with the results of 
Example [3] in Fig. @] 

Note the following similarities to Example [3] -D rd is constant with respect to n, D{ r decays polynomially with n, and D VI 
decays exponentially with n. □ 
Additional examples and details appear in [29 1 . 



14 





(a) fixed-rate 



(b) variable-rate 



Fig. 5. Optimal point densities for Example[4](median), n = 1, 3, . . . , 21. As n increases, the sensitivities Jj(x) become more unbalanced toward x = 1/2; 
this is reflected in the point densities, more so in the variable-rate case than in the fixed-rate case. 



g(xi,X2 




g(xi,x 2 




Fig. 6. Two versions of a function g of two variables are shown. The left g is separable and X \ is best quantized by a non-regular quantizer; for the right 
function (a rotated version of the left), a regular quantizer is asymptotically optimal. This is due to the right function being "equivalence-free." 



VI. Non-Monotonic Functions and Non-Regular Quantization 

The high-resolution approach to quantizer optimization is inherently limited to the design of regular quantizers. In particular, 
a point density function describes only the quantizer point locations; the partition is implicit. The analysis of Section [IV] 
therefore gave us the best quantizers within the class of regular quantizers, and it was the restriction of attention to monotonic 
functions that ensured global optimality. 

In this section we explore less restrictive alternatives to the monotonicity requirement. Specifically, we introduce the concept 
of equivalence-free and show that if a function has this property, then at a high enough rate, the optimal functional quantizers 
must be regular. 

Fig. [6] illustrates the concept. The function on the left is aligned with the axes in the sense that g(xx, Xz) depends only on 
x\. Since the dependence on x\ is not monotonic, there are pairs (a;|,xf) where g{x\ 1 X2) — g{x\,X2) and thus the optimal 
quantizer at high enough resolution has Q\(x\) = Q\(x\), giving a non-regular quantizer. When the function is rotated as 
shown on the right, there continues to be a lack of monotonicity but at high enough rate it can no longer be exploited. For 
some fixed x-i there may be pairs (a^|,a;|) such that g(x\,X2) = g(x\,X2), but since the equality does not hold for all X2, at 
sufficiently high rate it will not pay to have Q\{x\) = Q\(x\). 

Our approach is to first create a model for high-resolution non-regular quantization, then to use this model to expand the class 
of functions for which regular quantization is optimal, and finally to construct asymptotically optimal non-regular quantizers 
when regularity is suboptimal. 

A. High-Resolution Non-Regular Quantization 

To accommodate non-regular quantization, we extend the compander-based model of quantization. Companding is to 
implement a non-uniform quantizer as w~ 1 (q(w(x))) where it; is a compressor, q is a uniform quantizer, and w^ 1 is an 
expander. The reader is referred to [2| for additional details and references to original sources. 

In Bennett's development of optimal companding, it is natural to require w to be both monotonic and have a bounded 
derivative everywhere; the derivative w'(x) is proportional to the quantizer point density X(x) that has been central in our 
development thus far. Whether we look at A or w, the role is to set the relative sizes of the quantization cells. 

Since optimal functional quantizers are not necessarily regular, we adapt the conventional development to implement non- 
regular quantizers. 
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(a) Function of interest (b) Generalized compressor 

Fig. 7. Example of a generalized compressor wi(xi) for a function g(x\,X2) and the partition resulting from uniform quantization of u>i(Xi). Notice 
that the compressor dictates both the relative sizes of cells and the binning of intervals of X values. 

Definition 3: A function it; is a generalized compressor if it is continuous, piecewise monotonic with a finite number of 
pieces, and has bounded derivative over each piece. The inverse map ui -1 is called a generalized expander and is not necessarily 
a function. 

As in ordinary companding, w and w~ x are used along with a uniform quantizer q as w~ 1 (q(w(x))). The restriction to a 
finite number of pieces is a limitation on the types of non-regular quantizers that can be captured with this model: those for 
which every quantizer cell is a finite union of regular cells (intervals). Barring certain pathological situations, this restriction 
is reasonable. 

Along with setting relative sizes of cells, w can now bin intervals together to provide for non-regularity. To illustrate this, 
let us briefly consider a simple example. Suppose that the pair (Xi,X 2 ) is uniformly distributed over [0, l] 2 , variable rate 
quantization is to be performed on both variables, and the function of interest is defined by 

g(xi,x 2 ) = xi(§ - xi){l - x 2 ). 

An optimal functional quantizer — a quantizer for X\ to minimize YL[(g(Xi, X 2 ) — g(X\, X2)) 2 } — should bin together X\ 
values that always yield the same g(Xi, X 2 ). It can be seen from a plot of g(xi,x 2 ) (Fig. |7h) that the segment X\ € [0, 3/8] 
is identical in this respect to the segment X\ E [3/8, 3/4]. This yields the constraint wi(xi) = Wi(3/4 — x) for x £ [0, 3/4]. 
Furthermore, (T% sets the magnitude of the slope of wx in relation to the expected magnitude of the slope of g: 

3 

|it?x(ac)| oc j — 2a;i. 

This still leaves many choices for wi, the most obvious being w\ = g(xi,0). A shifted and normalized version of this choice, 
along with the resulting quantizer, is drawn in Fig. [7[3. 

B. Equivalence-Free Functions 

We now define a broad class of functions for which regular quantization is optimal at sufficiently high resolutions. Consider 
the distributed functional scalar quantization problem for a function g(X™) defined on [0, 1]" subject to mean-squared error 
distortion. We will focus on the design of the jth quantizer. 

We require a set of definitions: 

Definition 4: For any s 7^ t in the support of Xj, let 

Vj {s, t)=E [var (g(X?) \ X e {s, t}, {X t }^)} . 

If Vj(s,t) = then (s,t) is afunctional equivalence in the jth variable. If g has no functional equivalences in any of its 
variables, we say it is equivalence-free. 

The theorem below demonstrates that for DFSQ with an equivalence-free function, quantizer regularity is asymptotically 
necessary for optimality. Specifically, non-regular quantization is shown to introduce a nonzero lower bound on the distortion, 
independent of rate. This is formalized with the aid of generalized companding. 

Theorem 10: Let g be equivalence-free with respect to the distribution of X\ L on [0, 1]™. Suppose quantization of each Xj 
is performed as Yj = q(wj(Xj)) where Wj is a generalized compressor and q is a uniform quantizer. If there is an index j, 
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closed interval S, and function t : M — > M such that P (Xj E S) > and, for every s e S, s / t(s) and i/Jj(s) = Wj(t(s)), 
then the distortion has a positive, rate-independent lower bound. 

Proof: See Appendix iDl ■ 
The positive, rate-independent lower bound shows that the quantizer is suboptimal if the rate is sufficiently high; even 
naive uniform quantization will yield D = 0(2 -2 ) dependence on rate and thus will eventually outperform the non-regular 
quantizer. It should be noted, however, that the rate above which a non-regular quantizer is necessarily suboptimal has not 
been specified here. 

When a function has equivalences, the best asymptotic quantization tactic is to design compressors that bin all the equivalent 
values in each variable but are otherwise monotonic. 



VII. Don't-Care Intervals and Rate Amplification 

Ordinary high-resolution analysis produces point-density functions that reflect the source distribution in the sense that optimal 
quantizers never have zero point density where there is nonzero probability density. In fact, having zero point density where 
there is nonzero probability density would contradict the conditions that validate the high-resolution analysis. The situation is 
more complicated in the functional setting since the optimal point densities depend on both the functional sensitivity profiles 
and the source distributions. As foreshadowed by the qualifications in Theorem [8] having zero functional sensitivity where the 
probability density is nonzero changes the optimal quantizers in the variable-rate case. 

The following example illustrates the potential for failure of the analysis of Section IIV-CI Note that the intricacies arise 
even with a univariate function. 

Example 5: Let X have the uniform distribution over [0, 1], and suppose the function of interest is g(X) = min(X, 1/2). It 
is clear that the optimal quantizer (for both fixed- and variable-rate) has uniform point density on [0, 1/2]. With the functional 
sensitivity profile given by 

[0, otherwise, 

evaluating ( TTOb and ( fT2l is consistent with the intuitive result. 

The distortion for the fixed-rate case obtained from (fTTT i is (l/12)(l/2) 3 2 _2 - R . This is sensible since for half of the source 
values (X > 1/2) there is zero distortion by having a single codeword at 1/2, whereas for the other half of the source values 
(X < 1/2), 2^—1 codewords quantize a random variable uniformly distributed over [0, 1/2]. 

The variable-rate case is problematic. Since E [log 2 ~f{X)} = — oo, evaluating dT3l yields D w 0. The high-resolution 
rate analysis does not apply because the quantization is not fine over the full support of fx- (The high-resolution distortion 
analysis is valid, as we will establish formally in Section IVII-AI ) The performance is easily described by considering the first 
representation bit to specify the event A = {X < 1/2} or its complement. Since additional bits are useful only when A occurs, 
one can spend 2(R — 1) bits in those cases to have an average expenditure of R bits. The resulting distortion is 

D = V(A)D\ A +V(A c )D\ A c 

1 1 /l\2 9 -2(2i?,-2) r 1 n — lo-4fl 
~ 2'l2V2/ Z t 3' U " 6 Z 

Note that the exponent in the distortion-rate relationship has changed. □ 
In the example, there is an interval X £ [1/2, 1] of source values that need not be distinguished for function evaluation. Let 
us define a term for such intervals before discussing the example further. 

Definition 5: An interval Z C [0, 1] is called a don't-care interval for the variable when the jth functional sensitivity 
jj is identically zero on Z, but the probability P (Xj 6 Z) is positive. 

In univariate FSQ, at high enough rates, each don't-care interval corresponding to a distinct value of the function should be 
allotted one codeword. This follows from reasoning similar to that given in Section IVI-BI and is illustrated by Example [5] In 
the fixed-rate case, the don't-care intervals simply occupy a few of the 2 R codewords and have a limited effect. Contrarily in 
the variable-rate case, the don't-care intervals produce a subset of source values that can be allotted very little rate. This gives 
more rate to be allotted outside the don't-care intervals and behavior we refer to as rate amplification. We demonstrate rate 
amplification for multivariate FSQ in Section IVII-BI after covering the distortion analysis and fixed-rate case in Section IVII-AI 



A. Distortion Analysis and Fixed-Rate Optimization 

In the following analysis we will assume that the jth variable has a finite number Mj of don't-care intervals {Zj t i, Zj %, . . . , Zj t 
We also assume 

P {Xj G Zj) < 1 for j = 1, 2 . . . , n, (27) 

where Zj = (jf^Zjj denotes the union of don't-care intervals for the jth variable. Without this, there is no improvement 
beyond Mj levels in representing Xj, so the high-resolution approach is wholly inappropriate. We will denote the event 

XjiZjbyAj. 
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For fixed-rate DFSQ, the optimal operational distortion-rate expression (fT&b remains valid when variable Xj has don't- 
care intervals, even though the optimal point density Xj obtained from ( TT3T > is zero where fx is nonzero (invalidating some 
arguments in Appendix|B]). Here we give an argument relying on an explicit characterization of the distortion similar to ( fT4b . At 
high enough rates, it is intuitive to allot a codeword of Qj to each don't-care interval Zjj. The remaining Kj — Mj codewords 
are assigned optimally to [0, 1] \ Zj according to the basic theory developed in Section Hvl 

Theorem 11: Suppose n sources Xf S [0, 1]" are quantized in a distributed manner with a Kj -level quantizer with point 
density Xj applied to Xj. Further suppose that the sources, quantizers, and function g : [0, 1]" — > M. satisfy the assumptions 
of Section HV-AI with the exception that HR1' need not hold for the jth variable where Xj = 0. Finally, assume each source 
Xj has Mj don't-care intervals satisfying ( f27b . Then the optimal point densities satisfy 



Xj (x) = for all x e Zj 



(28) 



and yield 



D 



E 



E 



p(^) 



- l2(Kj - Mj) 2 



E 



A, 



(29) 



The optimal point densities for fixed-rate quantization are given by d28l l inside the don't-care intervals and by (fT~5T > outside of 
the don't-care intervals. These point densities yield 



D 



Proof: See Appendix [E] 



E 



\2{Kj-Mj 
1 ™ 

^Ell^Hi/32 



p-||7j7xJi/ 3 



12 



(30) 
(31) 



3=1 



B. Variable-Rate and Rate Amplification 

In the variable-rate case, it remains true that don't-care intervals should not be finely quantized (see ( T28l l) and the distortion 
calculation (|29l holds. The distinction from the fixed-rate case is that Xj S Zj not only implies that the jth variable has 
limited impact on the distortion, but also that it can be allocated very little rate. 

To formalize the analysis, we define discrete random variables to represent the events of source variables lying in don't-care 
intervals. 

Definition 6: The random variable 

L f i, if Xj e Z s ,i for i € {1, 2, . . . , Mj}; 
1 \ 0, otherwise. 

is called the jth don't-care variable. The previously-defined event Aj can be expressed as {Ij = 0}. 

At high enough rates, the jth encoder communicates Ij and in addition, only when Ij = 0, a fine quantization of Xj. The 
resulting performance is summarized by the following theorem. 

Theorem 12: Under the conditions of Theorem [TT] the optimal point densities for variable-rate quantization follow (fT9l and 
yield 

1 ™ 

D M T^E^INI? w 

3=1 

x2 -2(a i (fl-K^))+ft(X 3 '|A i )+E[log 2 (7j(X 3 '))|^] i 

where aj — 1/P (Aj) is the amplification of Rj. 

Proof: See Appendix [F] ■ 
Some remarks: 

1) The quantity H(Ij) may be identified as the cost of communicating the indicator information to the decoder. The 
remaining rate, Rj — H(Ij), is amplified by factor aj because additional description of Xj is useful only when Xj ^ Zj. 
The amplification shows that the standard —6 dB/bit distortion decay may be exceeded in the presence of don't-care 
regions. 

2) At moderate rates, it may not be optimal to communicate Ij losslessly, and it may be beneficial to include Xj values 
with small but positive jj in don't-care intervals. Study of this topic is left for specific applications. 
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Qi 



-X, 



Y = Y 



2->l 



X 2 



g 2 



Fig. 8. Suppose the encoder for X2 could send a bit to the encoder for X\. Is there any benefit? How does it compare to sending an additional bit to the 
decoder? 



3) The rate amplification we have seen in the variable-rate case and the relative lack of importance of don't-care intervals in 
the fixed-rate case have a close analogy in ordinary lossy source coding. Suppose a source X is a mixed random variable 
with an A/-value discrete component and a continuous component. High-resolution quantization of X will allocate one 
level to each discrete value and the remaining levels to the continuous component. The discrete component changes 
the constant factor in <d(2~ 2R ) fixed-rate operational distortion-rate performance while it changes the decay rate in the 
variable-rate case. See [ 30 1 for related rate-distortion (rather than high-resolution quantization) results. 



VIII. Chatting Encoders 

Our final variation on the basic theory of distributed functional scalar quantization is to allow limited communication between 
the encoders. How much can the distortion be reduced via this communication? Echoing the results of the previous section, 
we will find dramatically different answers in the fixed- and variable-rate cases. 

For notational convenience, we will fix the communication to be from encoder 2 to encoder 1 though the number of source 
variables n remains general. In accordance with the block diagram of Fig. [8] the information Y = l2->i must be conditionally 
independent of Xi given X2- We first consider the case where Y is a single bit. 

In this section, we express the functional distortion as 

1 ™ 

12 ^ 3 

3=1 

where various expressions for Dj have been found for different scenarios, including ( fT6] l, d20l i, (f3~TT >, and d32l . At issue is how 
Di is affected by Y; the other Dj& are obviously not affected. 



A. Fixed-Rate Quantization 

In general, the availability of a single bit Y causes one to choose between two potentially-different quantizers Qi\y=o 
Q\\Y=i m tne quantization of X\. We express the optimal quantizers and the resulting distortion contribution Di by way of 
the following concept. 

Definition 7: The jth conditional functional sensitivity profile of g given Y = y is defined as 

1/2 



(x\y)=(E[\g j (X?)\ 2 \X j =x ) Y 



where gj(x") denotes dg{x™)/dxj. 

Now several results follow by analogy with Theorem [8] For the case of Y — y, the optimal point density is given by 



Ally (x I y) 



resulting in conditional distortion contribution 



\1i\y( x I y)fx 1 \v(x I y) 

l(i! lY (t\y)fx 1 \Y( t \y) 



1/3 



1/3 



dt 



12K( 



li\Y=yfx 1 \Y=y 



1/3 



Combining the two possibilities for Y via total expectation gives 



D x = J2 V (Y = V) 7iV=v/*i|v=» ■ (33) 

y=0 

From this expression we reach an important conclusion on the affect of the chatting bit Y. 

Theorem 13: For fixed-rate quantization, communication of one bit of information from decoder 2 to decoder 1 can at most 
reduce D\ by a factor of 4. 
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.X'2 




Fig. 9. Illustration for Example[6] Shown is the unit square [0, l] 2 with quadrants marked with the value of gi(x±, 1E2). 



Proof: From Theorem [8] the distortion contribution analogous to ( f33b without the chatting bit y is 7i/xi L /<>• Thus the 
fact we wish to prove is a statement about C ' pseudonorms of surrogate densities and their conditional forms. 
We proceed as follows: 



D i = J2\\ p ( Y = y^ 2 i\Y( x \y)fx 1 \Y(x\y) 



y=o 



1/3 



(a) 1 
> - 



1 

4 

(b) 1 
4 

(c) 1 



^P(y-y) 7l 2 |y (x|y)/ Xl | y (x|y) 
v=o 



^ F(Y = y)f XllY (x\y) 
fx, (x) 2^ : 



1/3 

7iV( x I y) 



fx! (x) 



1 



1/3 



1/3 



- , J/jCi (^)Ti (^)H 1/3 • 

Step (a) uses a quasi-triangle inequality stated and proved in Appendix |Gl (b) is an application of Bayes's Rule; and (c) is 
based on an evaluation of the (unconditional) functional sensitivity via the total expectation theorem with conditioning on Y. 
This proves the theorem. ■ 

Note that the result of Theorem Qj] may be iterated for multiple bits of side information Y and that a factor of 4 reduction 
in D\ per bit of communication may be guaranteed if the bits are instead put towards communication from encoder 1 to the 
decoder. These observations yield the following corollary. 

Corollary 14: For fixed-rate functional quantization, communication of R additional bits between encoders performs at best 
as well as communication of R additional bits to the centralized decoder. 

In general, the idea that bits from encoder 2 to encoder 1 are as good as bits from encoder 1 to the decoder is optimistic. 
In particular, if E [7 2 (Xi)] > 0, then D\ is bounded away from zero for any amount of communication from encoder 2 to 
encoder 1. 



B. Variable-Rate Quantization 

In a variable-rate scenario, the rate R\ could be made to depend on the chatting bit Y, introducing a bit allocation problem 
between the cases of Y — and Y = 1. Even without such dependence, we can demonstrate that the bit Y can reduce the 
first variable's contribution to the functional distortion by an arbitrary factor. 

Analogous to d33l l, 

^ 1 -^P(y-y)|| 7l | y=y ||;2 2 ' i ( Xl l y ^ +2E N^ 1 |^^)] (34) 
by comparison with (l20i >. In contrast to the C 1 ^ 3 pseudonorms in d33l l. this linear combination can be arbitrarily smaller than 

|| 7i ||2 2 2h(X 1 )+2E[log 2 7i(X 1 )] i 

We demonstrate this through a simple example. 

Example 6: Let sources X\ and X 2 be uniformly distributed on [0, l] 2 . We specify the function of interest g through its 
partial derivatives. Let 32(^1,^2) = 1 f° r a U (^1,^2) an d let 51(^1,^2) be piecewise constant as shown in Fig. [9j where L 
is a positive constant. 
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We can easily derive the first functional sensitivity profile of g to be 

7l (x) = ^/±(L 2 + 1). 
This also allows us to find the distortion contribution factor D\ without chatting to be 

Dl = \{L 2 + lf. 

In this example, one bit about X2 is enough to allow the encoder for X\ to perfectly tailor its point density to match the 
sensitivity of g at (Xi,X 2 ). Of course, the chatting bit should simply be 

Y = { 0, ifX 2 >l/2; 
[ 1, otherwise. 

The first conditional functional sensitivity profiles for g are then 

f 1, for Y = and X x < 1/2 
1i\y{x \ v) = \ or Y = 1 and X\ > 1/2; 
I L, otherwise. 

Now for either value of y, we have J j±\y(x \y)dx = \{L + 1) and E [log 2 7i|y=y(^i)] = \ log 2 L. Thus, evaluating (f34b 
gives 

D 1 = ±(L + l) 2 L. 

This is smaller than the D\ with no chatting by about a factor of L. The performance gap can be made arbitrarily large by 
increasing L — all from a single bit of information communicated between encoders. □ 



C. Comparison with Non-Functional Source Coding 

The results of this section are strikingly different from those of ordinary source coding. Consider first the discrete scenario 
in which we with to recreate Xf perfectly at the decoder. Can communication between encoders enable a reduction in the 
rate of communication to the decoder? According to Slepian and Wolf, the answer is a resounding "no." Even in the case of 
unlimited collaboration via fused encoders, the minimum sum rate to the decoder remains unchanged. 

How about in lossy source coding? If quantization is variable-rate and Slepian- Wolf coding is employed on the quantization 
indices, no gains are possible from encoder interactions. This is a consequence of the work of Rebollo-Monedero et al. J9) on 
high-resolution Wyner-Ziv coding, where it is shown that there is no gain from supplying the source encoder with the decoder 
side information. 



IX. Summary 

We have developed asymptotically-optimal designs of functional quantizers using high-resolution quantization theory. This 
has shown that accounting for a function while quantizing a source can lead to arbitrarily large improvements in distortion. In 
certain scenarios (Section [V}, this improvement can grow exponentially with the number of sources. In others (Section I Villi, 
it can grow exponentially with rate. 

Additionally, our study of functional quantization has highlighted some striking distinctions between fixed- and variable-rate 
cases: 

1) For certain simple functions of order statistics, distortion relative to ordinary quantization falls polynomially with the 
number of sources in the fixed-rate case, whereas in the variable-rate case it falls exponentially. 

2) The distortion associated with fixed-rate quantizers will always exhibit —6 dB/bit rate dependence at high rates, whereas 
the decay of distortion can be faster in some variable-rate cases. 

3) Information sent from encoder-to-encoder can lead to arbitrarily-large improvements in distortion for variable-rate, 
whereas for fixed-rate this information can be no more useful than if it were sent to the decoder. 

The second and third of these have extensions or analogues beyond functional quantization. Rate amplification is a feature 
of quantizing sources with mixed distributions, and the results on chatting encoders continue to hold when the function g is 
the identity operation. 
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Appendix A 
Proof of Theorem[4] 



The distortion can be written as 



D =E 



[(g(X)-g(X)f 
E MX) - , g (ft)) 2 \ X eS l ]P{X e ft) 



(35) 



by the law of total expectation. The desired expression (O will be obtained by approximating the well-behaved terms in 
and showing that the remaining terms can be safely ignored. Let Bclbe comprised of the indices i for which g'(x) and 
g"(x) are bounded for all x E ft. 

Well-behaved terms: Let i E B. Then for x E Si, 

g(x) = g{fii) + g'(l3i)(x - ft) + R t {x) 

where Ri(x) is a remainder function bounded pointwise by ©. Now expand the conditional expectation in (l35T l as 

E[(g(X)~g((3 l )) 2 \XES l ] 

= E[( 5 '(ft)(x-ft)) 2 |Xeft] 

+ E [2g'(pi)(X - (3i)Ri{X) + Rj{X) \ X E ft] 

The first term is easily approximated by high-resolution analysis and we wish to show that the second is asymptotically 
negligible. 

For the first term, note that the approximate linearity of g on Si implies we should place ft at the center of Si. Furthermore, 
the length of Si is approximately (if A(ft)) -1 and X is conditionally approximately uniform on Si. Thus the first term is 
i( 5 '(ft)) 2 (A'A(ft))- 2 . 

To bound the second term, note that 

\2g'((3i)(x-p i )R i (x)+R 2 i (x)\ 

has a uniform (3(length(ft) 3 ) bound for x E ft; this follows from the bounded derivatives in Assumption UF2 and bound 
©. This makes the second term negligible in comparison to the first term, which is 0(length(ft) 2 ). 

Other terms: We now wish to show that the i E T \ B terms in (|35l l can be safely ignored. We do not have differentiability 
at x E Si, but the continuity of g prevents anything too bad from happening. Continuity on a closed interval implies uniform 
continuity, so there exists a finite constant c such that \g(x) — g(ft)| < c\x — ft| for x E Si. The conditional expectation is 
thus bounded by c 2 (length(ft)) 2 , and 

]T E [(g(X) - 5 (ft)) 2 leSjPfle ft) 

i£l\B 

< \T\B\-c 2 ■ max(length(ft)) 2 (36) 

i 

by replacing each term with an upper bound that does not even account for P (X E ft) <C 1. Thus at high resolution these 
terms may be ignored. 

Final expression: We are now left with 

D * T,^(9 , m) 2 (Km)r 2 p(xes l ) 



i^£( ff '(ft)/A(ft)) 2 P(Xeft) 

(a) 1 



12K 2 



12K 2 

(b) 1 



( 7 (x)/X(x)) 2 f x (x)dx 



E 



xeSi,i£B 

( 7 (X)/AP0) 2 



12K 2 

where (a) is a standard high-resolution approximation; and (b) follows from P (AT E Ujgiysft) — > and 
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Appendix B 
Proof of Theorem[6] 

We wish to develop the estimate ( fT4l i, which relates functional distortion to functional sensitivity profiles. The introduction 
of functional sensitivity profiles is motivated by the Taylor series approximation of g. Our main task is thus to show that the 
error in Taylor series approximation becomes negligible under the high-resolution assumptions of Section IIV-AI 

Recall the notation from Section IIV-AI For j 6 {1,2,..., n}, the quantization points of quantizer Qj are denoted 
{0i ; }iexU) an d me partition cells are denoted {S^} ieI u)- To simplify expressions below, let 



and 



S i? = S£> x 5 4 ( 2 2) x ... x 5 



x ••• xl("», 

(n) 



Then we can express the distortion as 

D = E - g(X?)) 1 



(37) 



^ E [(g(X?) - g{^)f | X? G $»] P (X? e $») 



by the law of total expectation. We wish to approximate the conditional expectations in this sum based on linear approximation 
of g within cell 5<», and we require vanishing relative error as resolution increases. 
By Taylor's theorem, 



i=i 

where <j>j denotes dg/dxj and the remainder term R^(xi) is small near Specifically, 



(38) 



i=i fc=i 



where gjk{%\) denotes d 2 g(xi)/dxjdxk and the maximum is over £™ on the line connecting x™ and Assumption MF2 
(that g is twice continuously differentiable) and the compactness of [0, l] n implies that there is an upper bound 



< c for all [0,1]". 



Thus, for x'l G Sq>, 



|i2ij(a!i)| < h™ 2 max (length 2 (S^)) 
Using the Taylor expansion with remainder, 

E^ra-^A^fl^e^] = 



(39) 



E 



E 



E 



J =1 



2 IE - W I ™ I x " e 5i ? 



(40) 



Paralleling the development in Appendix lAl the first term yields the desired approximation and the second and third terms are 
asymptotically negligible. 

The first term can be evaluated under the assumptions of Section IIV-AI Since fx" is approximately constant on 
(Assumption HR1') and the function is approximately affine on we can take ft™ to be the center of Si*. Then, conditioned 
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on X" G Sin, we have that the random variables {Xj — f}^) }™ =1 are mutually uncorrelated. So all cross terms in the conditional 
expectation are zero, and 



E 



71 r 2 

Esi(A?) E |X?G^ 
i=i L 

i n 



(41) 
(42) 



where the last step uses the length of S^. in the usual way. 

Bounding the second and third terms of d40b is easy because we have already shown that \R^(xi)\ — Oil 2 ) on S^n, where 
I denotes the maximum of the lengths of the sides of S^. We thus have that the second and third term of (l40l > are together 
0(£ 3 ), which is negligible since the distortions we obtain are 0(i 2 ). 

Having simplified (l40b to d42l i. we substitute in (l37l i to make the final computations: 



D 



n 1 2 

Ei2^? E (ffiW/Wj?)) p(*rear) 



.7=1 



J ifGZf 



(a) 



to 



1 

Et9R^ E (ffjWVAiC^^P^G^p 



12K 



n 1 
n 1 

12X ? 2 



E 



7j(*j) 



where (a) uses that at high resolution, and Aj are approximately constant in a partition cell; (b) is the standard association 
of a sum with an integral; and (c) follows from first integrating over the n—1 variables excluding j to get squared functional 
sensitivity profiles in the integrand and then integrating over variable j. 

Appendix C 
Proof of Theorem[7] 

To minimize functional MSE, the optimal estimator clearly should compute the conditional expectation of g(X{ 1 ) given the 
received codewords: 

g(J3 i ~)=E[g(X?)\X?eS i »], 
where we have used notation from Appendix [B] In analogy to (f37l >. 



D 



opt 



(43) 



E E [(g(X?) - 5(/%)) 2 | X? eS q ]P (X? eS 1? ). 

Our goal is to show that D — D opt is asymptotically negligible, and we will do this by subtracting ( |43l from ( |37] i and 
bounding each term. For this, we would like the conditional expectation of 

(g(X?)-g(l3 q )) 2 -(g(X?)-m?)) 2 
= (g(/3 q )-g((3in))((g((3 i? )-g(X^) 

v ' V v ' 

+ (g(^)-g(X-))) 
- » ' 

c 

to be small. We would like to obtain an o(£ 2 ) bound, where £ is the maximum of the lengths of the sides of S^n as in 
Appendix IB1 since the distortion is Q(£ 2 ), this will make the suboptimality of the estimator g negligible. 
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To bound A, we first determine how close g(Pi^) is to g((3i^), using the Taylor expansion with remainder derived in 
Appendix [B] Computing the conditional expectation of (|38l l gives 

flO%) = <?(/%) + / ^M)fx?\{x^s Hn} { x i) dx i 

J Sin 
1 

where the first-order terms integrate to zero because of assumption HR1'. Now we can use d39l l to conclude |^4| < ^cn 2 £ 2 . 

To bound B and C, note that the conditions on g imply that it is Lipschitz continuous under any metric on the domain. 
Denoting the Lipschitz constant by L and using the oo-norm for convenience, we obtain \B\ < L£. Also, using the intermediate 
value theorem to argue that <?(/3;™) — for some £™ S we obtain \C\ < L£. 

Putting our calculations together, every conditional expectation that appears in the term-by-term difference between d43l > and 
(EDi is bounded by ( \cn 2 £ 2 )(2L£). Thus 

D - D opt < cn 2 L £ 3 , 

which is asymptotically negligible. 



Appendix D 
Proof of Theorem [Tol 



The theorem asserts that when the function is equivalence-free, uij failing to be one-to-one on the support of Xj creates a 
component of the distortion that cannot be eliminated by quantizing more finely. The proof here lower-bounds the distortion 
by focusing on the contribution from just the jth variable. The bound is especially crude because it is based on observing 
{Xi}i^j and Wj(Xj) without quantization and it uses only the contribution from Xj E S U t(S). 

We wish to first bound the functional distortion in terms of a contribution from the jth variable: 



(a) 

D > E 

(b) 

> E 



var( ff (A7) | Y? 
var( ff (A7) | Yj, {X^] 



> YL[vzx{g{Xi?)\w j {X j ), {X t }^)} 

( = } E [var(fl(Xf ) | Wj (Xj), {X^) | A] P (A) 

+ E[var( 9 (XD | w^Xj), {X^j) \ A C ]P(A C ) 

> E [var( ff (A7) | Wj {Xj), {X,}^) \ A] P (A) , 



(44) 



where A is the event X 3 ■ e S U t(S). Step (a) will hold with equality when the optimal estimate (the conditional expectation 
of g{X") given the quantized values) is used; (b) holds because, for each i =^ j, Yi is a function of Xf, (c) holds because Yj 
is a function of Wj(Xj); (d) is an application of the total expectation theorem; and (e) holds because the discarded term is 
nonnegative. It remains to use the hypotheses of the theorem to bound the conditional variance in the final expression. 
Since the function is equivalence free, for every s 6 S, 



Thus 



E [var( ff (Xn | X, e {s,t(s)}, {X^j)} > 0. 



f E [var (g(X?) \ X 3 e {s, t(s)}, {X^)] f Xj (s) ds 

Js&S 



s = 
> 0. 

Finally, i5 is a lower bound to ( f44l > because integrating over s E S forms the event A and conditioning on Xj e {s, t(s)} is 
more restrictive than conditioning on the value of Wj(Xj). 



Appendix E 
Proof of TheoremQT] 

For brevity, the proof will rely on notation and computations from Appendix |B] and details closely paralleling Appendix |B] 
are omitted. The basics of using a Taylor expansion with remainder to bound the distortion are unchanged, and the computations 
up to d4TT > do not depend on positivity of XjS. Evaluating d37l i using (|4TT > and the negligibility of the Taylor remainder terms 
gives 



D 



EE 



■E 
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In this expression, every term with S Zj is zero because = by definition of a don't-care interval. Removing 

the zero terms, estimating partition cell lengths with point densities, and replacing sums with integrals gives 



n . 



3=1 """3 



where ifj is the number of cells of Qj allocated to [0, 1] \ Zj and the integration is over [0, lp 1 x ([0, 1] \ Zj) x [0, 1]™ 
This can be expressed in a more conceptually transparent way as 



n 1 
1 o z 



3=1 



-E 

12JT? 



7 3 (^-) x2 



The distortion is minimized by making the ifj-s as large as possible, which is to set Kj = Kj — Mj, reserving Mj codewords 
to specify the don't-care intervals Q This proves (l28l and ( f29b . The optimization of the point densities for fixed-rate quantization 
follows as in Theorem [8] yielding (l3Qb . The final expression ( f3Tb follows simply by noting that we are considering the limits 
as the KjS grow, in which case 2 Rj = Kj w ifj — Mj. 

Appendix F 
Proof of Theorem[T21 

It is already shown in Theorem QT| that it is optimal to allot a single codeword to each don't-care interval and that the 
distortion expression ( |29l l then holds. After an appropriate rate analysis, we will optimize the point densities outside of the 
don't-care intervals. 

The key technical problem is that the rate analysis does not hold when there are intervals where fx is positive but A is 
not. This is easily remedied by only applying conditioned on Aj\ 

H(Xj I Aj) » h{Xj I Aj) + log 2 (Kj ~ Mj) + E [log 2 Xj(Xj) \ Aj] . 

Now conditioned on Aj, the dependence of distortion and rate on Aj is precisely in the standard form of Section [IV] Thus, 
following Theorem [8] the optimal point density outside of Zj is given by dl9) . 

Since the previous results now give the distortion in terms of the conditional entropies H(Xj \ Aj), what remains is to 
relate these to the rates: 

Rj = H(Xj) 

(a) 



H(Xj,Ij) 
H{I j ) + H{X j \I j ) 

H{I 3 )+Y>{A 3 )H{X 3 \A 3 ), 



where (a) uses that Ij is a deterministic function of X,; and (b) uses that specifying any Ij 7^ determines Xj uniquely. 
Rearranging in anticipation of evaluating 



log 2 (Kj -M 3 ) « (P^))- 1 ^-^)) 

-h(Xj\A j )-E[log 2 (Xj(X j )\Aj]. 

Now evaluating ( f29b with optimal point densities ( fT9] > gives (|32] >. 

Appendix G 
A Quasi-Triangle Inequality 

Lemma 15: Let x and y be functions K — > IR + with finite C 1 / 3, pseudonorms. Then 

NI1/3 + H2/H1/3 > j||^ + 2/||i/3. 

Proof: First, we prove the relation 4(a 3 + b 3 ) > (a + b) 3 for positive real numbers a and 6: 

4(a 3 + b 3 ) - (a + b) 3 

= 4a 3 + 46 3 - a 3 - b 3 - 3a 2 b - 3ab 2 
= 3(a + b){a-b) 2 > 0. 



4 The argument given presumes minimization for given K 3 s. But it holds for the variable-rate case as well: there is clearly no benefit to splitting don't-care 
intervals at a cost of increased rate with no decrease in distortion. 
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Now by this relation, with a = J x(t) 1 / 3 dt and b = J y(i) 1/3 dt: 

INI1/3 + IMI1/3 - (Jx(ty/ S d?j +(y y (t)V3^ 

i(y (x(t)V3 +yW i/3)^ 



> 



> 



j(y ((x(t)+^)) i /3) 



= ^lF + 2/||i/3, 

where the second inequality uses, pointwise over t, the concavity of the cube-root function on [0, 00) 
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